reinforcement learning
Below is a draft for a 10-minute commentary at an internal study session
reinforcement learning
supervised learning
Input and Teacher Data
In Go, it's called notation.
Who's going to make the teacher data?
People.
I can't talk about ten cases, a hundred cases.
AlphaGo
160,000 games
28.4 million boards
57.0%
self competition
How many times?
state-value network
Take data from the results of the self-match.
Only one board is taken from each game.
30 million = 30 million games
---
This page is auto-translated from /nishio/強化学習. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.